Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

نویسندگان

چکیده

Personalizing a speech synthesis system is highly desired application, where the can generate with user’s voice rare enrolled recordings. There are two main approaches to build such in recent works: speaker adaptation and encoding. On one hand, methods fine-tune trained multi-speaker text-to-speech (TTS) model few samples. However, they require at least thousands of fine-tuning steps for high-quality adaptation, making it hard apply on devices. other encoding encode enrollment utterances into embedding. The TTS synthesize conditioned corresponding Nevertheless, encoder suffers from generalization gap between seen unseen speakers. In this paper, we propose applying meta-learning algorithm method. More specifically, use Model Agnostic Meta-Learning (MAML) as training model, which aims find great meta-initialization adapt any few-shot tasks quickly. Therefore, also meta-trained speakers efficiently. Our experiments compare proposed method (Meta-TTS) baselines: baseline baseline. evaluation results show that Meta-TTS high speaker-similarity samples fewer than outperforms under same scheme. When pre-trained extra 8371 data, still outperform LibriTTS dataset achieve comparable VCTK dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Meta-SGD: Learning to Learn Quickly for Few Shot Learning

Few-shot learning is challenging for learning algorithms that learn each task in isolation and from scratch. In contrast, meta-learning learns from many related tasks a meta-learner that can learn a new task more accurately and faster with fewer examples, where the choice of meta-learners is crucial. In this paper, we develop Meta-SGD, an SGD-like, easily trainable meta-learner that can initial...

متن کامل

Few-Shot Learning with Meta Metric Learners

Existing few-shot learning approaches are based on either meta-learning or metriclearning, which would suffer if the tasks have varying numbers of classes and/or the tasks diverge significantly. We propose meta metric learning to deal with the limitations of the existing few-shot learning approaches. Our meta metric learning approach consists of two components, task-specific learners that explo...

متن کامل

Meta-Learning for Semi-Supervised Few-Shot Classification

In few-shot classification, we are interested in learning algorithms that train a classifier from only a handful of labeled examples. Recent progress in few-shot classification has featured meta-learning, in which a parameterized model for a learning algorithm is defined and trained on episodes representing different classification problems, each with a small labeled training set and its corres...

متن کامل

Learning speaker-specific phrase breaks for text-to-speech systems

The objective of this paper is to investigate whether prosodic phrase breaks are specific to a speaker, and if so, propose a mechanism of learning speaker-specific phrase breaks from the speech database. Another equally important aspect dealt in this work is to demonstrate the usefulness of these speaker-specific phrase breaks for a text-to-speech system. Experiments are carried out on two diff...

متن کامل

16 Text to - Speech ( TTS ) Synthesis

AT&T Laboratories 16.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2022

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2022.3167258